智能论文笔记

WeLM: A Well-Read Pre-trained Language Model for Chinese

Hui Su , Xiao Zhou , Houjing Yu , Yuwen Chen , Zilin Zhu , Yang Yu , Jie Zhou

分类：自然语言处理 | 人工智能

2022-09-21

通过自我监督的学习预先训练的大型语言模型在各种各样的任务上表现出令人印象深刻的零击功能。在这项工作中，我们介绍了Welm：一种针对中文的精心读取的预训练的语言模型，能够无缝执行不同类型的任务，以零或几次演示。 Welm通过“阅读”涵盖广泛主题的精选高质量语料库来接受10b参数的培训。我们表明，韦尔姆拥有有关各种领域和语言的广泛知识。在18个单语（中文）任务中，WELM可以大大优于现有的预训练模型，尺寸相似，并匹配高达25倍大的模型的性能。韦尔姆还表现出强大的多种语言和代码转换理解的能力，优于预先对30种语言进行预培训的现有多语言模型。此外，我们收集了人工编写的提示，并通过多次培训进行了大量的中文和微调韦尔姆的监督数据集。最终的模型可以实现对看不见的任务类型的强烈概括，并在零射门学习中优于无监督的韦尔姆。最后，我们证明韦尔姆具有解释和校准自己的决策的基本技能，这可能是未来研究的有希望的方向。我们的模型可以从https://welm.weixin.qq.com/docs/api/应用。

translated by 谷歌翻译

Grasp Stability Prediction with Sim-to-Real Transfer from Tactile Sensing

Zilin Si , Zirui Zhu , Arpit Agarwal , Stuart Anderson , Wenzhen Yuan

分类：机器人

2022-08-04

机器人仿真一直是数据驱动的操作任务的重要工具。但是，大多数现有的仿真框架都缺乏与触觉传感器的物理相互作用的高效和准确模型，也没有逼真的触觉模拟。这使得基于触觉的操纵任务的SIM转交付仍然具有挑战性。在这项工作中，我们通过建模接触物理学来整合机器人动力学和基于视觉的触觉传感器的模拟。该触点模型使用机器人最终效应器上的模拟接触力来告知逼真的触觉输出。为了消除SIM到真实传输差距，我们使用现实世界数据校准了机器人动力学，接触模型和触觉光学模拟器的物理模拟器，然后我们在零摄像机上演示了系统的有效性 - 真实掌握稳定性预测任务，在各种对象上，我们达到平均准确性为90.7％。实验揭示了将我们的模拟框架应用于更复杂的操纵任务的潜力。我们在https://github.com/cmurobotouch/taxim/tree/taxim-robot上开放仿真框架。

translated by 谷歌翻译

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Chenglong Li , Tianhao Zhu , Lei Liu , Xiaonan Si Zilin Fan , Sulan Zhai

分类：计算机视觉

2021-11-08

在许多可视化系统中，视觉跟踪通常基于RGB图像序列，其中一些目标在低光条件下无效，因此追踪性能显着影响。介绍深度和红外数据等其他模态是处理单个来源的成像限制的有效方法，但多模态成像平台通常需要详细设计，并且目前不能应用于许多现实世界应用。近红外（NIR）成像成为许多监视摄像机的重要组成部分，其成像基于光强度在RGB和NIR之间切换。这两种方式具有异质性，视觉特性非常不同，因此为视觉跟踪带来了大量挑战。但是，现有的作品没有研究过这个具有挑战性的问题。在这项工作中，我们解决了跨模型对象跟踪问题并贡献新的视频数据集，包括总共具有超过481K帧的654个跨模型图像序列，并且平均视频长度超过735帧。为促进跨模型对象跟踪的研究和开发，我们提出了一种新的算法，它学习模态感知目标表示，以减轻跟踪过程中RGB和NIR模式之间的外观差距。它是即插即用，因此可以灵活地嵌入到不同的跟踪框架中。对数据集进行广泛的实验，我们展示了两个代表性跟踪框架中提出的算法的有效性，其针对17个最先进的跟踪方法。我们将发布数据集进行免费学术用法，数据集下载链接和代码即将发布。

translated by 谷歌翻译

PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management

Jiarui Fang , Zilin Zhu , Shenggui Li , Hui Su , Yang Yu , Jie Zhou , Yang You

分类：机器学习

2021-08-12

预训练的模型（PTM）正在革新人工智能（AI）技术。但是，PTM培训的硬件要求非常高，使其成为一小部分人的游戏。因此，我们提出了Patrickstar系统，以降低PTM的硬件要求，并使所有人都可以使用。 Patrickstar使用CPU-GPU异质存储空间来存储模型数据。与现有作品不同，我们在内存块中组织模型数据，并在异质内存中动态分配它们。在热身迭代中收集的运行时内存统计的指导下，块在异质内存中有效地精心策划，并生成较低的CPU-GPU数据传输量和较高的带宽利用率。与零冗余优化器的共生，Patrickstar量表在多个节点上均为多个GPU。％使用数据并行性。该系统可以在更大的型号和较大的批次大小上训练任务，这是现有工程无法完成的。实验结果表明，Patrickstar扩展了模型量表2.27和2.5倍，并且始终显示出更高的执行速度。 Patricstar还成功地在32 GPU集群上成功运行了175B GPT3培训任务。我们的代码可在https://github.com/tencent/patrickstar上公开获取。

translated by 谷歌翻译

HUSP-SP: Faster Utility Mining on Sequence Data

Chunkai Zhang , Yuting Yang , Zilin Du , Wensheng Gan , Philip S. Yu

分类：人工智能

2022-12-29

High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.

translated by 谷歌翻译

Beyond Discrete Genres: Mapping News Items onto a Multidimensional Framework of Genre Cues

Zilin Lin , Kasper Welbers , Susan Vermeer , Damian Trilling

分类：自然语言处理

2022-12-08

In the contemporary media landscape, with the vast and diverse supply of news, it is increasingly challenging to study such an enormous amount of items without a standardized framework. Although attempts have been made to organize and compare news items on the basis of news values, news genres receive little attention, especially the genres in a news consumer's perception. Yet, perceived news genres serve as an essential component in exploring how news has developed, as well as a precondition for understanding media effects. We approach this concept by conceptualizing and operationalizing a non-discrete framework for mapping news items in terms of genre cues. As a starting point, we propose a preliminary set of dimensions consisting of "factuality" and "formality". To automatically analyze a large amount of news items, we deliver two computational models for predicting news sentences in terms of the said two dimensions. Such predictions could then be used for locating news items within our framework. This proposed approach that positions news items upon a multidimensional grid helps in deepening our insight into the evolving nature of news genres.

translated by 谷歌翻译

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao , Qilong Wang , Bingbing Zhang , Qinghua Hu , Peihua Li

分类：计算机视觉 | 机器学习

2021-10-27

对于视频识别任务，总结了视频片段的整个内容的全局表示为最终性能发挥着重要作用。然而，现有的视频架构通常通过使用简单的全局平均池（GAP）方法来生成它，这具有有限的能力捕获视频的复杂动态。对于图像识别任务，存在证据表明协方差汇总具有比GAP更强的表示能力。遗憾的是，在图像识别中使用的这种普通协方差池是无数的代表，它不能模拟视频中固有的时空结构。因此，本文提出了一个时间 - 细心的协方差池（TCP），插入深度架构结束时，以产生强大的视频表示。具体而言，我们的TCP首先开发一个时间注意力模块，以适应性地校准后续协方差汇集的时空特征，近似地产生细心的协方差表示。然后，时间协方差汇总执行临界协方差表示的时间汇集，以表征校准特征的帧内相关性和帧间互相关。因此，所提出的TCP可以捕获复杂的时间动态。最后，引入了快速矩阵功率归一化以利用协方差表示的几何形状。请注意，我们的TCP是模型 - 不可知的，可以灵活地集成到任何视频架构中，导致TCPNet用于有效的视频识别。使用各种视频架构的六个基准（例如动力学，某事物和电力）的广泛实验显示我们的TCPNet明显优于其对应物，同时具有强大的泛化能力。源代码公开可用。

translated by 谷歌翻译

Taxim: An Example-based Simulation Model for GelSight Tactile Sensors

Zilin Si , Wenzhen Yuan

分类：机器人

2021-09-09

仿真广泛用于系统验证和大规模数据收集的机器人。然而，模拟传感器包括触觉传感器，这是一个长期存在的挑战。在本文中，我们提出了针对视觉触觉传感器的税法，逼真和高速仿真模型，Gelsight。凝胶传感器使用一块软弹性体作为接触的介质，并嵌入光学结构以捕获弹性体的变形，其在接触表面处施加的几何形状和力。我们提出了一种基于示例性的模拟eGelight方法：我们使用多项式查找表模拟对变形的光学响应。此表将变形几何形状映射到由嵌入式摄像机采样的像素强度。为了模拟由弹性体的表面拉伸引起的表面标记的运动，我们应用线性弹性变形理论和叠加原理。仿真模型校准，具有来自真实传感器的少于100个数据点。基于示例的方法使模型能够轻松地迁移到其他裸体传感器或其变化。据我们所知，我们的仿真框架是第一个包含从弹性体变形的标记运动场仿真以及光学仿真，创造了全面和计算的触觉模拟框架。实验表明，与现有工作相比，我们的光学仿真具有最低的像素 - 方面强度误差，并可以在线计算在线计算。我们的代码和补充材料在https://github.com/cmurobotouch/taxim开放。

translated by 谷歌翻译

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

Tao Yu , Rui Zhang , Kai Yang , Michihiro Yasunaga , Dongxu Wang , Zifan Li , James Ma , Irene Li , Qingning Yao , Shanelle Roman

分类：

2018-09-24

We present Spider, a large-scale, complex and cross-domain semantic parsing and textto-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and textto-SQL task where different complex SQL queries and databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new database schemas. Spider is distinct from most of the previous semantic parsing tasks because they all use a single database and the exact same programs in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 12.4% exact matching accuracy on a database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task are publicly available at https://yale-lily. github.io/spider.

translated by 谷歌翻译

Cluster-guided Contrastive Graph Clustering Network

Xihong Yang , Yue Liu , Sihang Zhou , Siwei Wang , Wenxuan Tu , Qun Zheng , Xinwang Liu , Liming Fang , En Zhu

分类：机器学习

2023-01-03

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

translated by 谷歌翻译